Search CORE

24 research outputs found

The JCilk multithreaded language

Author: Lee I-Ting Angelina
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2005
Field of study

Thesis (S.M.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2005.Includes bibliographical references (p. 103-107).JCilk is a Java-based multithreaded programming language which extends Java to provide a dynamic threading model. Specifically, JCilk imports Cilk's fork-join primitives spawn and sync into Java to provide procedure-call semantics for concurrent subcomputations. More importantly, JCilk integrates exception handling with multi-threading by defining semantics consistent with Java's existing semantics of exception handling. JCilk's strategy of integrating multithreading with Java's exception semantics yields some surprising semantic synergies. In particular, JCilk extends Java's exception semantics to allow exceptions to be passed from a spawned method to its parent in a natural way that obviates the need for Cilk's inlet and abort constructs. This extension is "faithful" in that it obeys Java's ordinary serial semantics when executed on a single processor. When executed in parallel, however, an exception thrown by a JCilk computation signals its sibling computations to abort, yielding a clean semantics in which only a single exception from the enclosing try block is handled. To minimize the complexity of reasoning about aborts, JCilk signals them "semisynchronously" so that abort signals do not interrupt ordinary serial code. Because JCilk uses Java's normal exception mechanism to propagate an abort throughout a subcomputation, the programmer can handle clean-up by simply catching a thrown CilkAbort exception. This thesis documents in detail the designed semantics, the linguistic decisions we made, and their justifications. This thesis also describes the structure of JCilk compiler and how it supports the exception semantics.(cont.) Specifically, the JCilk compiler performs a two-stage compilation process to support the continuation mechanism required by the runtime system's work-stealing algorithm. By performing static analysis, the compiler generates code to support the "catchlet" and "finallet" mechanisms for handling exceptions. The design of JCilk represents joint research with John S. Danaher and Charles E. Leiserson.by I-Ting Angelina Lee.S.M

CiteSeerX

DSpace@MIT

Programming with Exceptions in JCilk

Author: Danaher John S.
Lee I-Ting Angelina
Leiserson Charles E.
Publication venue
Publication date: 01/01/2005
Field of study

JCilk extends the Java language to provide call-return semantics for multithreading, much as Cilk does for C. Java's built-in thread model does not support the passing of exceptions or return values from one thread back to the "parent" thread that created it. JCilk imports Cilk's fork-join primitives spawn and sync into Java to provide procedure-call semantics for concurrent subcomputations. This paper shows how JCilk integrates exception handling with multithreading by defining semantics consistent with the existing semantics of Java's try and catch constructs, but which handle concurrency in spawned methods. JCilk's strategy of integrating multithreading with Java's exception semantics yields some surprising semantic synergies. In particular, JCilk extends Java's exception semantics to allow exceptions to be passed from a spawned method to its parent in a natural way that obviates the need for Cilk's inlet and abort constructs. This extension is "faithful" in that it obeys Java's ordinary serial semantics when executed on a single processor. When executed in parallel, however, an exception thrown by a JCilk computation signals its sibling computations to abort, which yields a clean semantics in which only a single exception from the enclosing try block is handled. The decision to implicitly abort side computations opens a Pandora's box of subsidiary linguistic problems to be resolved, however. For instance, aborting might cause a computation to be interrupted asynchronously, causing havoc in programmer understanding of code behavior. To minimize the complexity of reasoning about aborts, JCilk signals them "semisynchronously" so that abort signals do not interrupt ordinary serial code. In addition, JCilk propagates an abort signal throughout a subcomputation naturally with a built-in CilkAbort exception, thereby allowing programmers to handle clean-up by simply catching the CilkAbort exception. The semantics of JCilk allow programs with speculative computations to be programmed easily. Speculation is essential for parallelizing programs such as branch-and-bound or heuristic search. We show how JCilk's linguistic mechanisms can be used to program a solution to the "queens" problem and an implemention of a parallel alpha-beta search.Singapore-MIT Alliance (SMA

CiteSeerX

DSpace@MIT

Elsevier - Publisher Connector

Efficient Race Detection with Futures

Author: Agrawal Kunal
Fineman Jeremy
Lee I-Ting Angelina
Utterback Robert
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 03/01/2019
Field of study

This paper addresses the problem of provably efficient and practically good on-the-fly determinacy race detection in task parallel programs that use futures. Prior works determinacy race detection have mostly focused on either task parallel programs that follow a series-parallel dependence structure or ones with unrestricted use of futures that generate arbitrary dependences. In this work, we consider a restricted use of futures and show that it can be race detected more efficiently than general use of futures. Specifically, we present two algorithms: MultiBags and MultiBags+. MultiBags targets programs that use futures in a restricted fashion and runs in time

O(T_1 \alpha(m,n))

, where

T_1

is the sequential running time of the program,

\alpha

is the inverse Ackermann's function,

m

is the total number of memory accesses,

n

is the dynamic count of places at which parallelism is created. Since

\alpha

is a very slowly growing function (upper bounded by

4

for all practical purposes), it can be treated as a close-to-constant overhead. MultiBags+ an extension of MultiBags that target programs with general use of futures. It runs in time

O((T_1+k^2)\alpha(m,n))

where

T_1

\alpha

m

and

n

are defined as before, and

k

is the number of future operations in the computation. We implemented both algorithms and empirically demonstrate their efficiency

arXiv.org e-Print Archive

Crossref

Memory abstractions for parallel programming

Author: Charles E. Leiserson
I-ting Angelina Lee
Publication venue: Massachusetts Institute of Technology
Publication date: 01/01/2012
Field of study

Thesis (Ph. D.)--Massachusetts Institute of Technology, Dept. of Electrical Engineering and Computer Science, 2012.Cataloged from PDF version of thesis.Includes bibliographical references (p. 156-163).A memory abstraction is an abstraction layer between the program execution and the memory that provides a different "view" of a memory location depending on the execution context in which the memory access is made. Properly designed memory abstractions help ease the task of parallel programming by mitigating the complexity of synchronization or admitting more efficient use of resources. This dissertation describes five memory abstractions for parallel programming: (i) cactus stacks that interoperate with linear stacks, (ii) efficient reducers, (iii) reducer arrays, (iv) ownershipaware transactions, and (v) location-based memory fences. To demonstrate the utility of memory abstractions, my collaborators and I developed Cilk-M, a dynamically multithreaded concurrency platform which embodies the first three memory abstractions. Many dynamic multithreaded concurrency platforms incorporate cactus stacks to support multiple stack views for all the active children simultaneously. The use of cactus stacks, albeit essential, forces concurrency platforms to trade off between performance, memory consumption, and interoperability with serial code due to its incompatibility with linear stacks. This dissertation proposes a new strategy to build a cactus stack using thread-local memory mapping (or TLMM), which enables Cilk-M to satisfy all three criteria simultaneously. A reducer hyperobject allows different branches of a dynamic multithreaded program to maintain coordinated local views of the same nonlocal variable. With reducers, one can use nonlocal variables in a parallel computation without restructuring the code or introducing races. This dissertation introduces memory-mapped reducers, which admits a much more efficient access compared to existing implementations. When used in large quantity, reducers incur unnecessarily high overhead in execution time and space consumption. This dissertation describes support for reducer arrays, which offers the same functionality as an array of reducers with significantly less overhead. Transactional memory is a high-level synchronization mechanism, designed to be easier to use and more composable than fine-grain locking. This dissertation presents ownership-aware transactions, the first transactional memory design that provides provable safety guarantees for "opennested" transactions. On architectures that implement memory models weaker than sequential consistency, programs communicating via shared memory must employ memory-fences to ensure correct execution. This dissertation examines the concept of location-based memoryfences, which unlike traditional memory fences, incurs latency only when synchronization is necessary.by I-Ting Angelina Lee.Ph.D

CiteSeerX

DSpace@MIT

On-the-fly pipeline parallelism

Author: Lee I-Ting Angelina
Leiserson Charles E.
Schardl Tao Benjamin
Sukha Jim
Zhang Zhunping
Publication venue: 'Association for Computing Machinery (ACM)'
Publication date: 01/01/2013
Field of study

Pipeline parallelism organizes a parallel program as a linear sequence of s stages. Each stage processes elements of a data stream, passing each processed data element to the next stage, and then taking on a new element before the subsequent stages have necessarily completed their processing. Pipeline parallelism is used especially in streaming applications that perform video, audio, and digital signal processing. Three out of 13 benchmarks in PARSEC, a popular software benchmark suite designed for shared-memory multiprocessors, can be expressed as pipeline parallelism. Whereas most concurrency platforms that support pipeline parallelism use a "construct-and-run" approach, this paper investigates "on-the-fly" pipeline parallelism, where the structure of the pipeline emerges as the program executes rather than being specified a priori. On-the-fly pipeline parallelism allows the number of stages to vary from iteration to iteration and dependencies to be data dependent. We propose simple linguistics for specifying on-the-fly pipeline parallelism and describe a provably efficient scheduling algorithm, the Piper algorithm, which integrates pipeline parallelism into a work-stealing scheduler, allowing pipeline and fork-join parallelism to be arbitrarily nested. The Piper algorithm automatically throttles the parallelism, precluding "runaway" pipelines. Given a pipeline computation with T[subscript 1] work and T[subscript ∞] span (critical-path length), Piper executes the computation on P processors in T[subscript P]≤ T[subscript 1]/P + O(T[subscript ∞] + lg P) expected time. Piper also limits stack space, ensuring that it does not grow unboundedly with running time. We have incorporated on-the-fly pipeline parallelism into a Cilk-based work-stealing runtime system. Our prototype Cilk-P implementation exploits optimizations such as lazy enabling and dependency folding. We have ported the three PARSEC benchmarks that exhibit pipeline parallelism to run on Cilk-P. One of these, x264, cannot readily be executed by systems that support only construct-and-run pipeline parallelism. Benchmark results indicate that Cilk-P has low serial overhead and good scalability. On x264, for example, Cilk-P exhibits a speedup of 13.87 over its respective serial counterpart when running on 16 processors.National Science Foundation (U.S.) (Grant CNS-1017058)National Science Foundation (U.S.) (Grant CCF-1162148)National Science Foundation (U.S.). Graduate Research Fellowshi

CiteSeerX

DSpace@MIT

Crossref

Safe Open-Nested Transactions Through Ownership

Author: Agrawal K.
I-Ting Angelina Lee
Jim Sukha
Kunal Agrawal
Moss J. E. B.
Moss J. E. B.
Publication venue
Publication date: 01/01/2008
Field of study

Researchers in transactional memory (TM) have proposed open nesting asa methodology for increasing the concurrency of a program. The ideais to ignore certain "low-level" memory operations of anopen-nested transaction when detecting conflicts for its parenttransaction, and instead perform abstract concurrency control for the"high-level" operation that nested transaction represents. Tosupport this methodology, TM systems use an open-nested commitmechanism that commits all changes performed by an open-nestedtransaction directly to memory, thereby avoiding low-levelconflicts. Unfortunately, because the TM runtime is unaware of thedifferent levels of memory, an unconstrained use of open-nestedcommits can lead to anomalous program behavior.In this paper, we describe a framework of ownership-awaretransactional memory which incorporates the notion of modules into theTM system and requires that transactions and data be associated withspecific transactional modules or Xmodules. We propose a newownership-aware commit mechanism, a hybrid between anopen-nested and closed-nested commit which commits a piece of datadifferently depending on whether the current Xmodule owns the data ornot. Moreover, we give a set of precise constraints on interactionsand sharing of data among the Xmodules based on familiar notions ofabstraction. We prove that ownership-aware TM has has cleanmemory-level semantics and can guarantee serializability bymodules, which is an adaptation of multilevel serializability fromdatabases to TM. In addition, we describe how a programmer canspecify Xmodules and ownership in a Java-like language. Our typesystem can enforce most of the constraints required by ownership-awareTM statically, and can enforce the remaining constraints dynamically.Finally, we prove that if transactions in the process of aborting obeyrestrictions on their memory footprint, the OAT model is free fromsemantic deadlock

DSpace@MIT

Crossref

Safe Open-Nested Transactions Through Ownership ABSTRACT

Author: I-ting Angelina
Kunal Agrawal
Lee Jim Sukha
Publication venue
Publication date: 01/01/2008
Field of study

Researchers in transactional memory (TM) have proposed open nesting as a methodology for increasing the concurrency of a program. The idea is to ignore certain “low-level ” memory operations of an open-nested transaction when detecting conflicts for its parent transaction, and instead perform abstract concurrency control for the “high-level ” operation that nested transaction represents. To support this methodology, TM systems use an open-nested commit mechanism that commits all changes performed by an open-nested transaction directly to memory, thereby avoiding low-level conflicts. Unfortunately, because the TM runtime is unaware of the different levels of memory, an unconstrained use of open-nested commits can lead to anomalous program behavior. In this paper, we describe a framework of ownership-aware transactional memory which incorporates the notion of modules into the TM system and requires that transactions and data be associated with specific transactional modules or Xmodules. We propose a new ownership-aware commit mechanism, a hybrid between an open-nested and closed-nested commit which commits a piece of data differently depending on whether the current Xmodule owns the data or not. Moreover, we give a set of precise constraints on interactions and sharing of data among the Xmodules based on familiar notions of abstraction. We prove that ownership-aware TM has has clean memory-level semantics and can guarantee serializability by modules, which is an adaptation of multilevel serializability from databases to TM. In addition, we describe how a programmer can specify Xmodules and ownership in a Java-like language. Our type system can enforce most of the constraints required by ownership-aware TM statically, and can enforce the remaining constraints dynamically. Finally, we prove that if transactions in the process of aborting obey restrictions on their memory footprint, the OAT model is free from semantic deadlock. 1

CiteSeerX